Topographic Mapping of Large Dissimilarity Data Sets
نویسندگان
چکیده
Topographic maps such as the self-organizing map (SOM) or neural gas (NG) constitute powerful data mining techniques that allow simultaneously clustering data and inferring their topological structure, such that additional features, for example, browsing, become available. Both methods have been introduced for vectorial data sets; they require a classical feature encoding of information. Often data are available in the form of pairwise distances only, such as arise from a kernel matrix, a graph, or some general dissimilarity measure. In such cases, NG and SOM cannot be applied directly. In this article, we introduce relational topographic maps as an extension of relational clustering algorithms, which offer prototype-based representations of dissimilarity data, to incorporate neighborhood structure. These methods are equivalent to the standard (vectorial) techniques if a Euclidean embedding exists, while preventing the need to explicitly compute such an embedding. Extending these techniques for the general case of non-Euclidean dissimilarities makes possible an interpretation of relational clustering as clustering in pseudo-Euclidean space. We compare the methods to well-known clustering methods for proximity data based on deterministic annealing and discuss how far convergence can be guaranteed in the general case. Relational clustering is quadratic in the number of data points, which makes the algorithms infeasible for huge data sets. We propose an approximate patch version of relational clustering that runs in linear time. The effectiveness of the methods is demonstrated in a number of examples.
منابع مشابه
Linear Time Heuristics for Topographic Mapping of Dissimilarity Data
Topographic mapping offers an intuitive interface to inspect large quantities of electronic data. Recently, it has been extended to data described by general dissimilarities rather than Euclidean vectors. Unlike its Euclidean counterpart, the technique has quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, it is infeasible already for medium sized data sets. W...
متن کاملTopographic Mapping of Dissimilarity Data
Topographic mapping offers a very flexible tool to inspect large quantities of high-dimensional data in an intuitive way. Often, electronic data are inherently non Euclidean and modern data formats are connected to dedicated non-Euclidean dissimilarity measures for which classical topographic mapping cannot be used. We give an overview about extensions of topographic mapping to general dissimil...
متن کاملThe Nyström approximation for relational generative topographic mappings
Relational generative topographic mappings (RGTM) provide a statistically motivated data inspection and visualization tool for pairwise dissimilarities by fitting a constraint Gaussian mixture model to the data. Since it is based on pairwise dissimilarities of data, it scales quadratically with the number of training samples, making the method infeasible for large data sets. In this contributio...
متن کاملLinear Time Relational Prototype Based Learning
Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underl...
متن کاملAdaptive prototype-based dissimilarity learning
In this thesis we focus on prototype-based learning techniques, namely three unsupervised techniques: generative topographic mapping (GTM), neural gas (NG) and affinity propagation (AP), and two supervised techniques: generalized learning vector quantization (GLVQ) and robust soft learning vector quantization (RSLVQ). We extend their abilities with respect to the following central aspects: • Ap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural computation
دوره 22 9 شماره
صفحات -
تاریخ انتشار 2010